# EEGUnity Kernel Tutorial: Rich Metadata, `misc`, `stim`, and Annotations This tutorial explains how to use EEGUnity kernels to inject dataset-specific metadata and channels in memory. ## 1. Design Principle EEGUnity keeps **locator metadata as source of truth**: - `format_channel_names()` standardizes locator channels as `channel_type:channel_name`. - `get_data_row()` uses locator metadata to overwrite raw metadata at load time. - Kernels are applied **after** locator-driven metadata patching. This allows online metadata maintenance without modifying source files. ## 2. What a Kernel Can Do A kernel can: - add or update `raw.info["description"]` - add or adjust multiple `misc` channels - add or adjust multiple `stim` channels - add/update annotations - **build a raw from scratch** for files that EEGUnity's parser cannot read (see Section 7) ### Standard kernel interface ```python class SomeKernel: KERNEL_ID: str = "my-kernel-v1" def apply(self, udataset, raw, row): # raw is a loaded mne.io.BaseRaw; row is the locator pandas.Series ... return raw KERNEL = SomeKernel() ``` `apply()` is called for every file whose `Completeness Check` is **not** `Unavailable`. ## 3. Annotation vs `misc` vs `stim` Use these three mechanisms for different semantics: - `Annotations`: text labels mapped to time segments (`onset`, `duration`, `description`). - `misc` channels: continuous values over time (for example probability density, reaction-time trajectory). - `stim` channels: integer event codes over time (for example class sequence 1/2/3). For a single scalar value for one segment, fill the covered segment in a `misc` channel. ## 4. Example Kernel with Multiple `misc` and `stim` Channels ```python from __future__ import annotations from dataclasses import dataclass import numpy as np import mne def add_channel(raw: mne.io.BaseRaw, ch_name: str, ch_type: str, values: np.ndarray) -> mne.io.BaseRaw: """Append one channel to raw with explicit MNE channel type.""" if values.ndim != 1: raise ValueError("values must be a 1D array") if values.shape[0] != raw.n_times: raise ValueError("values length must equal raw.n_times") info = mne.create_info([ch_name], sfreq=raw.info["sfreq"], ch_types=[ch_type]) ch_raw = mne.io.RawArray(values[np.newaxis, :], info, verbose=False) raw.add_channels([ch_raw], force_update_info=True) return raw @dataclass class ExampleKernel: KERNEL_ID: str = "example_rich_meta" def apply(self, udataset, raw: mne.io.BaseRaw, row): n = raw.n_times # misc channels (continuous signals) prob_density = np.linspace(0.1, 0.9, n, dtype=float) reaction_time = np.full(n, 0.42, dtype=float) raw = add_channel(raw, "prob_density", "misc", prob_density) raw = add_channel(raw, "reaction_time", "misc", reaction_time) # stim channels (integer codes) task_code = np.zeros(n, dtype=float) task_code[n // 4: n // 2] = 1 task_code[n // 2: 3 * n // 4] = 2 task_code[3 * n // 4:] = 3 stage_code = np.zeros(n, dtype=float) stage_code[n // 3: 2 * n // 3] = 7 raw = add_channel(raw, "task_code", "stim", task_code) raw = add_channel(raw, "stage_code", "stim", stage_code) # annotation segments (text semantics) ann = mne.Annotations( onset=[0.0, raw.times[n // 2]], duration=[2.0, 2.0], description=["trial_start", "feedback"], ) raw.set_annotations(ann) return raw KERNEL = ExampleKernel() ``` ## 5. Binding and Running ```python from eegunity import UnifiedDataset ud = UnifiedDataset( dataset_path=r"path/to/dataset", domain_tag="my_dataset", kernel_spec=r"path/to/example_kernel.py", ) # Parser path raw0 = ud.eeg_parser.get_data(0) # Batch path (kernel is also applied when loading row data in batch methods) ud.eeg_batch.get_file_hashes(data_stream=True) ``` ## 6. Channel Type Compatibility EEGUnity standard prefixes are lowercase MNE-style (`eeg`, `eog`, `emg`, `ecg`, `meg`, `stim`, `misc`, `bio`) and it also accepts explicit MNE channel type strings in locator entries, for example: - `seeg:LA1` - `ecog:G1` - `dbs:DBS1` - `fnirs_od:S1_D1_760` - `pupil:pupil_left` - `misc:prob_density` - `stim:task_code` Legacy uppercase prefixes (`EEG`, `EOG`, `EMG`, `ECG`, `STIM`, `Unknown`) are accepted for backward compatibility. ## 7. Extended Interface: Handling Unavailable Files EEGUnity marks files as `Completeness Check = Unavailable` when its built-in parser cannot determine the sampling rate (e.g., headerless CSV files, proprietary binary formats). By default, kernels are **not** called for Unavailable files. For datasets where EEGUnity cannot parse the file format at all, a kernel can opt in to build the raw from scratch by implementing the **extended interface**: | Attribute / Method | Required | Description | |--------------------|----------|-------------| | `HANDLES_UNAVAILABLE = True` | yes | Opt-in flag. Must be set to `True`. | | `load(self, row) -> BaseRaw \| None` | yes | Called first for Unavailable files. Build and return a `mne.io.RawArray` from the raw file. Return `None` to skip this file. | | `apply(self, udataset, raw, row)` | yes (same as always) | Called after `load()` completes, with the raw returned by `load()`. Use this for annotation injection and metadata enrichment — same as for Completed files. | ### Call sequence for Unavailable files ``` kernel.load(row) → raw (format parsing, build RawArray) kernel.apply(ud, raw, row) → raw (enrichment: annotations, description, …) ``` For **Completed** files the call sequence is unchanged: ``` EEGUnity parser → raw (standard MNE loader) kernel.apply(ud, raw, row) → raw (enrichment) ``` ### Example: headerless CSV dataset ```python from __future__ import annotations import json from dataclasses import dataclass import mne import numpy as np import pandas as pd _SFREQ = 2048.0 _CH_NAMES = ["EEG1", "EEG2"] @dataclass class HeaderlessCSVKernel: KERNEL_ID: str = "headerless-csv-v1" HANDLES_UNAVAILABLE: bool = True # opt in def load(self, row) -> mne.io.BaseRaw | None: """Build a RawArray from a headerless CSV file.""" file_path = row["File Path"] if not file_path.endswith(".csv"): return None # skip non-CSV files silently # Read EEG columns (0-indexed: columns 1 and 2) df = pd.read_csv(file_path, header=None, usecols=[1, 2]) eeg = df.to_numpy(dtype=float).T # (n_ch, n_samples) info = mne.create_info(_CH_NAMES, sfreq=_SFREQ, ch_types=["eeg", "eeg"]) return mne.io.RawArray(eeg, info, verbose=False) def apply(self, udataset, raw: mne.io.BaseRaw, row) -> mne.io.BaseRaw: """Inject metadata and annotations into the loaded raw.""" raw.info["description"] = json.dumps({ "eegunity_description": { "amplifier": "unknown", "cap": "unknown", "age": "unknown", "sex": "unknown", "handedness": "unknown", } }) # … add annotations here … return raw KERNEL = HeaderlessCSVKernel() ``` ### Backward compatibility Kernels that do **not** set `HANDLES_UNAVAILABLE = True` are never called for Unavailable files — behaviour is identical to before this interface was added. Existing kernels require no changes. ## 8. Recommended Practice - Use annotations for semantic event intervals. - Use `stim` for integer-coded sequences. - Use `misc` for continuous labels. - Keep kernel logic dataset-specific and deterministic. - For Unavailable-file support: put raw construction in `load()`, keep annotation/metadata logic in `apply()` so both code paths share the same enrichment step.